feat(x/evm): migrate x/evm metrics to OpenTelemetry Meter API by amir-deris · Pull Request #3423 · sei-protocol/sei-chain

amir-deris · 2026-05-12T21:05:03Z

Summary

Migrates telemetry in the x/evm/keeper and x/evm/ante packages from the legacy telemetry/utilmetrics helpers to the standardized OpenTelemetry Meter API, following the same pattern established in #3396 for app and app/ante.

Adds x/evm/keeper/metrics.go with a struct-based OTel instrument set registered via otel.Meter("evm_keeper"), initialized once via InitEvmKeeperMetrics() called from app.go after SetupOtelMetricsProvider()
Adds x/evm/ante/metrics.go with an evmAnteMetrics struct via otel.Meter("evm_ante"), initialized once via InitEvmAnteMetrics()
Replaces telemetry.ModuleMeasureSince / utilmetrics / seimetrics.SafeTelemetry* calls in all target files with dual-emitted OTel instruments; legacy calls remain with TODO(PLT-330) markers until dashboards are migrated
Threads ctx.Context() / goCtx through all .Record() and .Add() call sites
Disambiguates the evm_association_error_total counter that was registered by both keeper and ante into evm_keeper_association_error_total and evm_ante_association_error_total

New metrics (OTel naming convention, exported via the process-wide MeterProvider)

ABCI phase durations — histograms bucketed at fine-grained ms thresholds:

evm_abci_begin_blocker_duration_seconds
evm_abci_end_blocker_duration_seconds

Block fee:

evm_block_base_fee — synchronous gauge set each EndBlock

EVM transaction counters:

evm_panics_total
evm_errors_total — by type label (state_transition, stateDB_finalize, write_receipt, apply_message, vm_execution)
evm_receipt_status_total — by status label (success, failed)

Association errors:

evm_keeper_association_error_total — by scenario and type labels (emitted from keeper)
evm_ante_association_error_total — by scenario and type labels (emitted from ante)

Zero-storage cleanup:

evm_zero_storage_processed_keys_total
evm_zero_storage_pruned_keys_total
evm_zero_storage_pruned_bytes_total

Ante nonce/fee tracking:

evm_pending_nonce_total — by event label (added, expired, rejected, accepted)
evm_nonce_mismatch_total — by too_high bool attribute (true = nonce above expected, false = nonce below expected)
evm_effective_gas_price — histogram of effective gas price per EVM transaction

Note

Medium Risk
Touches EVM ante/keeper hot paths (BeginBlock/EndBlock, tx processing, nonce/fee checks) to add new OTel metric emissions; risk is mainly around performance/label correctness and potential numeric conversions for gauges/histograms.

Note

Medium Risk
Touches EVM ante/keeper hot paths (tx processing and BeginBlock/EndBlock) to add new OpenTelemetry metric emissions alongside legacy metrics; risk is mainly performance overhead and correctness of labels/value conversions (big.Int→float64).

Overview
Adds new OpenTelemetry Meter instrument sets for x/evm/ante and x/evm/keeper (new metrics.go files) and wires them into EVM hot paths.

Keeper now records BeginBlock/EndBlock duration histograms, base-fee gauge values (with safe big.Int→float64 conversion), tx panic/error/receipt-status counters, association-error counters, and zero-storage cleanup counters; ante now records effective gas price histograms plus nonce-mismatch and association-error counters.

Legacy utils/metrics/SafeTelemetry* calls remain in place but are renamed/aliased and annotated with TODO(PLT-330) while dashboards migrate.

^{Reviewed by Cursor Bugbot for commit ba7a653. Bugbot is set up for automated code reviews on this repo. Configure here.}

github-actions · 2026-05-12T21:06:10Z

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`✅ passed`	`✅ passed`	`✅ passed`	May 14, 2026, 6:23 PM

codecov · 2026-05-12T21:08:53Z

Codecov Report

❌ Patch coverage is 69.01408% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.23%. Comparing base (0f16b6d) to head (ba7a653).

Files with missing lines	Patch %	Lines
x/evm/keeper/metrics.go	38.46%	6 Missing and 2 partials ⚠️
x/evm/ante/fee.go	45.45%	5 Missing and 1 partial ⚠️
x/evm/keeper/msg_server.go	62.50%	6 Missing ⚠️
x/evm/ante/metrics.go	50.00%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #3423   +/-   ##
=======================================
  Coverage   59.23%   59.23%           
=======================================
  Files        2116     2118    +2     
  Lines      175440   175463   +23     
=======================================
+ Hits       103916   103932   +16     
- Misses      62460    62463    +3     
- Partials     9064     9068    +4

Flag	Coverage Δ
sei-chain-pr	`72.53% <69.01%> (?)`
sei-db	`70.41% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
x/evm/ante/preprocess.go	`80.23% <100.00%> (+0.23%)`	⬆️
x/evm/ante/sig.go	`83.78% <100.00%> (+0.92%)`	⬆️
x/evm/keeper/abci.go	`59.32% <100.00%> (+3.76%)`	⬆️
x/evm/keeper/evm.go	`74.81% <100.00%> (+0.18%)`	⬆️
x/evm/keeper/storage_cleanup.go	`89.39% <100.00%> (+0.50%)`	⬆️
x/evm/ante/metrics.go	`50.00% <50.00%> (ø)`
x/evm/ante/fee.go	`68.49% <45.45%> (-4.53%)`	⬇️
x/evm/keeper/msg_server.go	`80.36% <62.50%> (+1.94%)`	⬆️
x/evm/keeper/metrics.go	`38.46% <38.46%> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

amir-deris · 2026-05-12T22:31:44Z

+		else \
+			DETACH_FLAG=""; \
+		fi; \
+		DOCKER_PLATFORM=$(DOCKER_PLATFORM) USERID=$(shell id -u) GROUPID=$(shell id -g) GOCACHE=$(shell go env GOCACHE) NUM_ACCOUNTS=10 INVARIANT_CHECK_INTERVAL=${INVARIANT_CHECK_INTERVAL} UPGRADE_VERSION_LIST=${UPGRADE_VERSION_LIST} MOCK_BALANCES=${MOCK_BALANCES} GIGA_EXECUTOR=${GIGA_EXECUTOR} GIGA_OCC=${GIGA_OCC} RECEIPT_BACKEND=${RECEIPT_BACKEND} AUTOBAHN=${AUTOBAHN} GIGA_STORAGE=${GIGA_STORAGE} docker compose -f docker-compose.yml -f docker-compose.monitoring.yml up --no-attach grafana --no-attach prometheus $$DETACH_FLAG


Grafana and Prometheus were adding a lot of noise to the logs so added --no-attach flag. Their logs can still be viewed with docker logs sei-grafana and docker logs sei-prometheus

amir-deris · 2026-05-12T22:39:44Z

  evaluation_interval: 15s

 scrape_configs:
-  - job_name: 'cryptosim'


I will rollback these changes to existing prometheus and grafana yaml files and create new configs.

bdchatham · 2026-05-13T00:46:10Z

+apiVersion: 1
+providers:
+  - name: default
+    orgId: 1
+    folder: ""
+    type: file
+    disableDeletion: false
+    updateIntervalSeconds: 30
+    options:
+      path: /var/lib/grafana/dashboards
+      foldersFromFilesStructure: false


I don't have context on these dashboards here. Are these to spin up dashboards for the docker compose personal stack? Seems like yes but want to confirm.

We're creating a mirror in the platform deployment, is that right?

Actually these are for local (docker) environment. So the idea is having a new target in MakeFile so we can run make docker-cluster-start-monitoring locally with docker compose and have a prometheus and grafana local containers and Grafana user interface to investigate metrics, do performance measurement of our changes without having to release them.
We currently don't have a clean setup locally for this. We can run scripts like ./docker/monitornode/scripts/start-prometheus.sh but their ports conflict with existing setup and we need to shut them down manually.

bdchatham · 2026-05-13T00:48:25Z

@@ -0,0 +1,8 @@
+apiVersion: 1


Lots of docker compose pieces here. I'm slightly leaning towards us not creating these and just supporting them on the new platform Grafana. Seems like complexity & scope that isn't worth it's weight to me although I like that we are thinking about the tooling end of this.

Thanks for feedback. I think these help with testing locally and iterating quickly for measuring performance changes against a metric. If that is ok, I would rather keep it 😊

bdchatham · 2026-05-13T01:00:14Z

 					// this nonce has already been mined, we cannot accept it again
-					metrics.IncrementPendingNonce("rejected")
+					utilmetrics.IncrementPendingNonce("rejected") // TODO(PLT-330): remove once evm_pending_nonce_total verified
+					evmAnteMetrics.pendingNonce.Add(ctx.Context(), 1, otelmetric.WithAttributes(attribute.String("event", "rejected")))


nit: you could define the attributes once and reuse them since they are the same each time. Prevents creating a now struct each time to represent this and just reuses an immutable instance.

Should apply to all of these that have a known set of possible values ahead of time.

bdchatham · 2026-05-13T01:02:05Z

+		cause := "too_low"
+		if tooHigh {
+			cause = "too_high"


nit: could simplify to "lower" and "higher" just too simplify use for dashboards or tools.

bdchatham · 2026-05-13T01:05:03Z

+func mustAnteMetric[V any](v V, err error) V {
+	if err != nil {
+		panic(err)
+	}
+	return v


nit: you're doing this but in different ways across packages. Consider consolidating into a single place for all your metrics structs to reuse.

Thanks for feedback. Per discussion with @masih , it seems in GO the emphasis is on packages being independent and not have coupling for code reuse.

bdchatham · 2026-05-13T01:07:27Z

+# Start 4-node cluster with Prometheus and Grafana monitoring
+docker-cluster-start-monitoring: docker-cluster-stop build-docker-node
+	@rm -rf $(PROJECT_HOME)/build/generated
+	@mkdir -p $(shell go env GOPATH)/pkg/mod
+	@mkdir -p $(shell go env GOCACHE)
+	@cd docker && \
+		if [ "$${DOCKER_DETACH:-}" = "true" ]; then \
+			DETACH_FLAG="-d"; \
+		else \
+			DETACH_FLAG=""; \
+		fi; \
+		DOCKER_PLATFORM=$(DOCKER_PLATFORM) USERID=$(shell id -u) GROUPID=$(shell id -g) GOCACHE=$(shell go env GOCACHE) NUM_ACCOUNTS=10 INVARIANT_CHECK_INTERVAL=${INVARIANT_CHECK_INTERVAL} UPGRADE_VERSION_LIST=${UPGRADE_VERSION_LIST} MOCK_BALANCES=${MOCK_BALANCES} GIGA_EXECUTOR=${GIGA_EXECUTOR} GIGA_OCC=${GIGA_OCC} RECEIPT_BACKEND=${RECEIPT_BACKEND} AUTOBAHN=${AUTOBAHN} GIGA_STORAGE=${GIGA_STORAGE} docker compose -f docker-compose.yml -f docker-compose.monitoring.yml up --no-attach grafana --no-attach prometheus $$DETACH_FLAG
+.PHONY: docker-cluster-start-monitoring
+
+# Stop monitoring containers (Prometheus and Grafana) and cluster
+docker-cluster-stop-monitoring:
+	@cd docker && DOCKER_PLATFORM=$(DOCKER_PLATFORM) USERID=$(shell id -u) GROUPID=$(shell id -g) GOCACHE=$(shell go env GOCACHE) docker compose -f docker-compose.yml -f docker-compose.monitoring.yml down
+.PHONY: docker-cluster-stop-monitoring


Any issues with testing this in your Harbour personal stack? Let me know if I can help. Ideally it's easy enough that you can use that instead of rolling net-new infra like this to test your changes

I believe this local version could allow faster iterations and experiments, to make sure there is value in the optimization example being made. Once we confirm, we can move to Harbour and test that modification in a more prod like environment. It also can lower our AWS cost to not take it to Harbour when there is no observed improvement.

masih

Left a couple of recommendations. This PR is still a PR to a non-main branch. Could we have this based against main please for the final review?

Thanks Amir!

masih · 2026-05-14T10:04:12Z

@@ -0,0 +1,8 @@
+apiVersion: 1


@amir-deris could you move the docker grafana stuff to a separate PR please?

Yes sir! will do that.

masih · 2026-05-14T10:07:13Z

+		utilmetrics.IncrementNonceMismatch(tooHigh) // TODO(PLT-330): remove once evm_nonce_mismatch_total verified
+		cause := "lower"
+		if tooHigh {
+			cause = "higher"


Turn tooHigh into a bool attribute instead if stringify-ing it?

…eus containers with docker compose

…ing existing ones

…d fixing gas big int overflow issue

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit ba7a653. Configure here.}

cursor · 2026-05-14T18:27:34Z

+			"evm_pending_nonce",
+			metric.WithDescription("EVM pending nonce events by type (added, expired, rejected, accepted)"),
+			metric.WithUnit("{count}"),
+		)),


Unused pendingNonce metric registered but never emitted

Low Severity

The pendingNonce counter is registered in evmAnteMetrics (allocating OTel resources at init) but evmAnteMetrics.pendingNonce is never referenced anywhere in the codebase. The corresponding legacy function IncrementPendingNonce in utils/metrics is also never called. This is dead code that registers a metric instrument at startup for no purpose. The same pattern exists in app/ante/metrics.go with app_pending_nonce, which is also unused.

^{Reviewed by Cursor Bugbot for commit ba7a653. Configure here.}

cursor Bot reviewed May 12, 2026

View reviewed changes

Comment thread x/evm/keeper/metrics.go Outdated

amir-deris changed the title ~~Amir/plt 329 migrate x evm metrics to otel~~ feat(x/evm): migrate x/evm metrics to OpenTelemetry Meter API May 12, 2026

amir-deris self-assigned this May 12, 2026

amir-deris added the non-app-hash-breaking label May 12, 2026

amir-deris commented May 12, 2026

View reviewed changes

amir-deris requested a review from bdchatham May 12, 2026 22:32

cursor Bot reviewed May 12, 2026

View reviewed changes

Comment thread x/evm/keeper/metrics.go Outdated

Comment thread x/evm/ante/fee.go Outdated

amir-deris requested a review from masih May 12, 2026 22:32

amir-deris changed the base branch from main to amir/plt-323-migrate-app-metrics-to-otel-meter-api May 12, 2026 22:34

amir-deris commented May 12, 2026

View reviewed changes

cursor Bot reviewed May 12, 2026

View reviewed changes

Comment thread x/evm/keeper/metrics.go Outdated

bdchatham reviewed May 13, 2026

View reviewed changes

amir-deris force-pushed the amir/plt-329-migrate-x-evm-metrics-to-otel branch from 96669ca to 6eb1415 Compare May 13, 2026 16:15

cursor Bot reviewed May 13, 2026

View reviewed changes

Comment thread Makefile Outdated

amir-deris force-pushed the amir/plt-329-migrate-x-evm-metrics-to-otel branch from 6eb1415 to f9e0114 Compare May 13, 2026 18:04

cursor Bot reviewed May 13, 2026

View reviewed changes

Comment thread x/evm/keeper/metrics.go Outdated

cursor Bot reviewed May 13, 2026

View reviewed changes

Comment thread x/evm/ante/metrics.go

amir-deris force-pushed the amir/plt-329-migrate-x-evm-metrics-to-otel branch 2 times, most recently from 2d4f018 to d2c9c1b Compare May 13, 2026 22:55

cursor Bot reviewed May 13, 2026

View reviewed changes

Comment thread x/evm/keeper/abci.go Outdated

masih reviewed May 14, 2026

View reviewed changes

Base automatically changed from amir/plt-323-migrate-app-metrics-to-otel-meter-api to main May 14, 2026 10:18

amir-deris added 6 commits May 14, 2026 08:07

plt-329 Added otel metrics for x/evm package

84abd52

plt-329 Added new makefile target for bringing up grafana and prometh…

782502e

…eus containers with docker compose

plt-329 Made metric names unique

57383f6

plt-329 Used new configs for grafana and prometheus instead of modify…

0016a38

…ing existing ones

plt-329 Code cleanup and refactoring

711dd14

plt-329 Cleaning up grafana / prometheus configs, updating readme, an…

74ac3fe

…d fixing gas big int overflow issue

amir-deris added 3 commits May 14, 2026 08:07

plt-329 Addressing some pr feedback

2774355

plt-329 Added unit to gas price histogram

33413a6

plt-329 fixed double import issue

a5c2a10

amir-deris force-pushed the amir/plt-329-migrate-x-evm-metrics-to-otel branch from 4b6883f to a5c2a10 Compare May 14, 2026 15:07

Rolled back docker related changes

5356b44

cursor Bot reviewed May 14, 2026

View reviewed changes

Comment thread x/evm/ante/sig.go Outdated

Comment thread x/evm/keeper/metrics.go

addressing pr feedback

823b86e

masih approved these changes May 14, 2026

View reviewed changes

Comment thread x/evm/keeper/metrics.go

cursor Bot reviewed May 14, 2026

View reviewed changes

Comment thread x/evm/ante/fee.go

Fixing bytes pruned unit

0b2b3a4

bdchatham approved these changes May 14, 2026

View reviewed changes

amir-deris added 2 commits May 14, 2026 11:15

Merge branch 'main' into amir/plt-329-migrate-x-evm-metrics-to-otel

53b8dab

Removed unused vars

ba7a653

cursor Bot reviewed May 14, 2026

View reviewed changes

amir-deris enabled auto-merge May 14, 2026 18:29

amir-deris added this pull request to the merge queue May 14, 2026

Merged via the queue into main with commit d79122f May 14, 2026
42 checks passed

amir-deris deleted the amir/plt-329-migrate-x-evm-metrics-to-otel branch May 14, 2026 18:53

amir-deris mentioned this pull request May 14, 2026

feat(metrics): migrate x/oracle, x/epoch, x/mint to OpenTelemetry #3439

Merged

Conversation

amir-deris commented May 12, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New metrics (OTel naming convention, exported via the process-wide MeterProvider)

Uh oh!

github-actions Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

masih left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 14, 2026

Choose a reason for hiding this comment

Unused pendingNonce metric registered but never emitted

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

amir-deris commented May 12, 2026 •

edited by cursor Bot

Loading

github-actions Bot commented May 12, 2026 •

edited

Loading

codecov Bot commented May 12, 2026 •

edited

Loading

Unused `pendingNonce` metric registered but never emitted